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Abstract 

The overlap Dirac operator in lattice QCD requires the computation of the sign function of a matrix. While this matrix is usually 
Hermitian, it becomes non-Hermitian in the presence of a quark chemical potential. We show how the action of the sign function 
of a non-Hermitian matrix on an arbitrary vector can be computed efficiently on large lattices by an iterative method. A Krylov 
subspace approximation based on the Arnoldi algorithm is described for the evaluation of a generic matrix function. The efficiency 
of the method is spoiled when the matrix has eigenvalues close to a function discontinuity. This is cured by adding a small number 
of critical eigenvectors to the Krylov subspace, for which we propose two different deflation schemes. The ensuing modified Arnoldi 
method is then applied to the sign function, which has a discontinuity along the imaginary axis. The numerical results clearly show 
the improved efficiency of the method. Our modification is particularly effective when the action of the sign function of the same 
matrix has to be computed many times on different vectors, e.g., if the overlap Dirac operator is inverted using an iterative method. 
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1. Introduction 

The only systematic nonperturbative approach to quan- 
tum chromodynamics (QCD) is the numerical simulation 
of the theory on a finite space-time lattice. For a long time, 
the implementation of chiral symmetry on the lattice posed 
serious problems [1] , but these problems have recently been 
solved in a number of complementary ways. Perhaps the 
most prominent solution is the overlap Dirac operator [2] 
which provides an exact solution of the Ginsparg- Wilson 
relation [3]. However, the price one has to pay for this solu- 
tion is the numerical computation of the sign function of a 
sparse matrix A of dimension N. Here and in the following, 
computing some function f of a matrix A is a short-hand for 
computing f(A) ■ x, where x G C N , i.e., determining the ac- 
tion of f(A) on the vector x. Typically A is Hermitian, and 
efficient methods to compute its sign function have been 
developed for this case [4,5]. 

The phase diagram of QCD is currently being explored 
experimentally in relativistic heavy ion collisions and the- 
oretically in lattice simulations and model calculations [6] . 
To describe QCD at nonzero density, a quark chemical po- 
tential is introduced in the QCD Lagrangian. If this chem- 



ical potential is implemented in the overlap operator [7], 
the matrix A loses its Hermiticity, and one is faced with the 
problem of computing the sign function of a non-Hermitian 
sparse matrix. On a small lattice, this can be done by per- 
forming a full diagonalization and using the spectral ma- 
trix function definition (see Eq. (4) below), but on larger 
lattices one needs to resort to iterative methods to keep the 
computation time and memory requirements under control. 

The purpose of this paper is to introduce such an iterative 
method. In the next section we describe the non-Hermitian 
problem in more detail and briefly discuss the sign func- 
tion for non-Hermitian matrices. In Sec. 4 we propose an 
Arnoldi-bascd method to make a Krylov subspace approx- 
imation of a generic matrix function. The efficiency of this 
method is poor when computing the sign function of a ma- 
trix having eigenvalues with small absolute real parts. This 
is caused by the discontinuity of the sign function along the 
imaginary axis. In Sec. 5 we enhance the Arnoldi method 
by taking into account exact information about these crit- 
ical eigenvalues. We use the method to compute the sign 
function occurring in the overlap Dirac operator of lattice 
QCD, and in Sec. 6 we discuss the results obtained for two 
different lattice sizes. 
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2. The overlap operator and the sign function 

The overlap formulation of the Dirac operator is a rig- 
orous method to preserve chiral symmetry at finite lat- 
tice spacing in a vector-like gauge theory. Its construction 
is based on successive developments described in seminal 
papers by Kaplan, Shamir, Furman, Narayanan and Ncu- 
berger [8-10,2]. In the presence of a non-zero quark chemi- 
cal potential (i, the massless overlap Dirac operator is given 

by [7] 

D ov (/i) = l + 75 sgn(F w (M)), (1) 
where sgn stands for the sign function, H w (fj,) = j^D w (n), 
D w (fj,) is the Wilson-Dirac operator at nonzero chemical 
potential [11,12] with negative Wilson mass m w <E (—2,0), 
75 = 71727374, and 7„ with v — 1, ... ,4 are the Dirac 
gamma matrices in Euclidean space. The Wilson-Dirac op- 
erator is a discretized version of the continuum Dirac op- 
erator that avoids the replication of fermion species which 
occurs when a naive discretization of the derivative opera- 
tor is used. It is given by 

[^w(p)]nm — 3n,m (2) 
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where k = 1/(8 + 2m w ) and U n u is the 5L/(3)-matrix asso- 
ciated with the link connecting the lattice site n to n + v. 
The exponential factors e ±AI are responsible for the non- 
Hcrmiticity of the operator. The quark field at each lattice 
site corresponds to 12 variables: 3 SU(3) color components 
x 4 Dirac spinor components. 

For fx ^ the argument H w (fi) of the sign function be- 
comes non-Hermitian, and we need to define the sign func- 
tion for this case. Consider first a given matrix A with no 
particular symmetry properties and a function /. Let T be 
a collection of closed contours in C such that / is analytic 
inside and on T and such that T encloses the spectrum of 
A. Then the function f(A) of the matrix A can be defined 
by [13] 

f(A)^^-If(z)(zI-A)- 1 dz, (3) 

where the integral is defined component-wise and / denotes 
the identity matrix. 

From this definition it is easy to derive a spectral function 
definition, even if the matrix is non-Hermitian. If the ma- 
trix A is diagonalizable, i.e., A = [/AC/ -1 with a diagonal 
eigenvalue matrix A = diag(Ai) and U € Gl(N, C), then 

f(A) = UfiAp- 1 (4) 



with 



/(A) = diag(/(A i )) . 



(5) 



If A cannot be diagonalized, a more general spectral defini- 
tion of f(A) can be derived from Eq. (3) using the Jordan 
decomposition A = U(^ i Ji)!/" 1 with Jordan blocks 
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with mi the size of the Jordan block, see [14]. The super- 
scripts denote derivatives of the function with respect to 
its argument. 

Non-Hermitian matrices typically have complex eigen- 
values, and applying Eq. (4) or (7) to the sign function in 
Eq. (1) requires the evaluation of the sign of a complex 
number. The sign function needs to satisfy [sgn(z)] 2 = 1 
and, for real x, sgn(x) = ±1 if x ^ 0. These properties are 
satisfied if one defines 

sgn(z) = —j= = sgn(Re(z)) , (9) 



where in the last equality the cut of the square root is 
chosen along the negative real axis. Using the definition (9) 
in the spectral definition (7) yields a matrix sign function 
in agreement with that used in [15-18]. Indeed, based on 
the general Jordan decomposition 



A = U 



J4 



u- 1 



(10) 



where J + represents the Jordan blocks corresponding to 
the eigenvalues with positive real part and J_ those cor- 
responding to the eigenvalues with negative real part, the 
spectral definition (7) for the sign function becomes 



sgn(A) = U 



u- 



-I 



(11) 



see also [19]. This definition agrees with the result one ob- 
tains when deriving Eq. (1) from the domain- wall fermion 
formalism at /i 7^ in the limit in which the extent of the 
fifth dimension goes to infinity [20] . 

For any square matrix A we have sgn(A) 2 = /, and a 
short calculation [7] shows that for this reason the overlap 
operator D ov (^l) as defined in Eq. (1) satisfies the Ginsparg- 
Wilson relation 
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(12) 



2 



If A is Hermitian, the polar factor pol(A) = A(A^A)~ 1/2 
of A coincides with sgn(A), and this fact has been used 
successfully to develop efficient iterative methods for com- 
puting the action of the matrix sign function on a vec- 
tor [21]. However, if A is non-Hcrmitian, then in general 
sgn(^4) 7^ pol(A) and pol(A) 2 ^ I. Thus, for /i^O, re- 
placing sgn(ff w ) by pol(lf w ) in the definition of the overlap 
operator in Eq. (1) not only changes the operator but also 
violates the Ginsparg- Wilson relation, as can also be seen 
in numerical experiments. We conclude that the definition 
given in Eq. (1) is the correct formulation of the overlap 
operator for /i =/= 0. 

3. Direct and iterative methods 

A numerical implementation of the sign function using 
the spectral definition (4) is only possible for small matri- 
ces, as a full diagonalization becomes too expensive as the 
matrix grows. As an alternative, matrix-based iterative al- 
gorithms for the computation of the matrix sign function 
have been around for many years [15,17,22,23]. Although 
these algorithms are much faster than the direct implemen- 
tation of the spectral definition for medium-sized problems, 
they still require the storage and manipulation (i.e., inver- 
sions and/or multiplications) of the entire matrix. This is 
feasible for medium-sized matrices, but becomes too expen- 
sive for the very large matrices occurring in typical lattice 
QCD simulations. E.g., even for an 8 3 x 8 lattice, which is 
the minimum lattice size required for a physically relevant 
problem, the matrix dimension is already 12 -8 3 - 8 w 50 000. 
Even though these QCD matrices are sparse, the iterative 
procedure computing the sign function fills the matrix as 
the iterations proceed, and the method eventually becomes 
prohibitively expensive. 

Therefore, another iterative method is required which 
docs not produce an approximation to the full sign matrix 
itself, but rather produces an approximation to the vec- 
tor y = sgn(A)x, i.e., to the operation of the sign matrix 
on an arbitrary vector. Many QCD applications only re- 
quire the knowledge of this product for a number of selected 
source vectors x. For instance, some low-energy properties 
of QCD can be described by the lowest-lying eigenvalues 
of the Dirac operator. These eigenvalues can efficiently be 
found by an iterative eigenvalue solver like ARPACK [24] , 
which only requires the computation of matrix- vector mul- 
tiplications. Analogously, the computation of the propaga- 
tion of fcrmions can be well approximated by inverting the 
Dirac operator on a selected number of source vectors bk, 
i.e., the solution of the systems D ov x = bk- These inversions 
are also performed using iterative linear solvers requiring 
only matrix- vector multiplications. 

Such iterative methods, mostly from the class of Krylov 
subspace methods, are already extensively used for the so- 
lution of eigenvalue problems, linear systems, and for func- 
tion evaluations [25,26] with Hermitian matrices. There, 
the ancestor of all methods is the Lanczos method, of which 



many variants and improvements have been built over the 
years. The Lanczos method makes use of short recurrences 
to build an orthonormal basis in the Krylov subspace. 

Krylov subspace methods are also used for non- 
Hcrmitian matrices in the context of eigenvalue problems 
(a popular example being the restarted Arnoldi method of 
ARPACK), for the solution of linear systems, and even for 
the evaluation of the exponential function [27,28]. The two 
most widely used methods are the Arnoldi method and the 
two-sided Lanczos method. In contrast to the Hermitian 
case, the Arnoldi method requires long recurrences to con- 
struct an orthonormal basis for the Krylov subspace, while 
the two-sided Lanczos method uses two short recurrence 
relations, but at the cost of losing orthogonality. 

In the next section we present an Arnoldi-bascd method 
to compute a generic function of a non-Hcrmitian matrix. 
The application of the two-sided Lanczos method to this 
problem will be investigated in a separate publication. 

4. Arnoldi function approximation for a 
non-Hermitian matrix 

From the spectral definition (4) it follows that f(A) is 
identical to some polynomial Pk-i{A) of degree K — 1 < 
N. Indeed, the unique interpolation polynomial Pk-x{z), 
which interpolates f(z) at the different eigenvalues A^ of 
A, satisfies f(A) = Pk-i(A), as follows immediately from 
Eq. (4). If A has non-trivial Jordan blocks, this is still true, 
but interpolation is now in the Hermite sense, where at 
each eigenvalue Pr-i interpolates / and its derivatives up 
to order one less than the size of the largest corresponding 
Jordan block, see [19]. 

Hence, for an arbitrary vector x, 

K-l 

y = f{A)x = P K -x{A)x = d Ai x ■ (13) 

i=0 

The idea is to construct an approximation to y using a 
polynomial of degree k— 1 with k -C N. One possibility is to 
construct a good polynomial approximation Pk-i(A) such 
that Pfe_i(Ai) rts /(A*). This would yield a single polyno- 
mial approximation operating on any vector x to compute 
f(A)x ?s Pfc_i(_A)x. One such example is the expansion in 
terms of Chcbyshcv polynomials up to degree k — 1. Al- 
though Chcbyshcv polynomials are very close to the mini- 
max polynomial for a function approximation over an ap- 
propriate ellipse in the complex plane, one can do better for 
matrix function approximations. First of all, one only needs 
a good approximation to / at the eigenvalues of A, and 
secondly, one can use information about the source vector 
x to improve the polynomial approximation. The vector x 
can be decomposed using the (generalized) eigenvectors of 
A as a complete basis, and clearly some eigendircctions will 
be more relevant than others in the function approxima- 
tion. Using a fixed interpolation polynomial does not use 
any information about the vector x. Furthermore, the only 
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feature of A usually taken into account by such polynomial 
approximations is the extent of its spectrum. 

Indeed, there exists a best polynomial approximation 
y = Pk-i(A)x of degree at most k — 1, which is readily 
defined as the orthogonal projection of y on the Krylov 
subspace K k {A, x) = span(:r, Ax, A 2 x, . . . , A k ~ 1 x). If V k = 
. . . ,v k ) is an N x k matrix whose columns form an 
orthonormal basis in lC k , then V k V^ is a projector on the 
Krylov space, and the projection is y = VkV^y. 

The operation of any polynomial of A of degree smaller 
than k on x is also an element of K-k, and the projection 
y corresponds to the polynomial which minimizes A = 
f(A)x — Pk-i(A)x, as A _L )Ck(A, x). Clearly, this approx- 
imation explicitly takes into account information about A 
and x. 

The problem, however, is that to find this best polyno- 
mial approximation of degree k — 1 one already has to know 
the answer y = f(A)x. Therefore, we need a method to ap- 
proximate the projected vector y. This can be done by one 
of the Krylov subspace methods mentioned in Sec. 3. Here, 
we use the Arnoldi algorithm to construct an orthonormal 
basis for the Krylov subspace K. k (A,x) using the long re- 
currence 

AV k =V k H k +0 k v k+1 el , (14) 
where v i = x/{3, /3 = \x\, H k is an upper Hcsscnbcrg matrix 
(upper triangular + one subdiagonal), k = Hk+i.k, and 
e k is the A;-th basis vector in C fe . The projection y of f(A)x 
on ]Ck(A, x) can be written as 

y = V k Vlf{A)x . (15) 

Making use of x = (iv\ = /314ei, Eq. (15) becomes 

V = PV k Vlf{A)V k e 1 . (16) 

From Eq. (14) it is easy to see that 

H k = VlAVk (17) 

as V^Vk = Ik and Vk+i -L Vk- Therefore it seems natural 
to introduce the approximation [27] 

f(Hk) « V k U{A)Vk (18) 

in Eq. (16), which finally yields 

y « pV k f(H k ) ei . (19) 

This expression is just a linear combination of the k Arnoldi 
vectors Vi, where the k coefficients arc given by (3 times 
the first column of f(H k ). Saad [29] showed that the ap- 
proximation (19) corresponds to replacing the polynomial 
interpolating / at the eigenvalues of A by the lower-degree 
polynomial which interpolates / at the eigenvalues of Hk, 
which are also called the Ritz values of A in the Krylov 
space. The hope is that for k not too large the approxima- 
tion (19) will be a suitable approximation for y. The ap- 
proximation of f(A)x by Eq. (19) replaces the computation 
of f(A) by that of f(Hk), where Hk is of much smaller size 
than A. The evaluation of (the first column of) f(Hk) can 
be implemented using a spectral decomposition, or another 
suitable evaluation method [15,17,22,23]. 



The long recurrences of the Arnoldi method make the 
method much slower than the Lanczos method used for 
Hermitian matrices. Nevertheless, our first results showed 
consistent convergence properties when computing the ma- 
trix sign function: as the size of the Krylov space increases, 
the method converges to within machine accuracy. More 
precisely, for the sign function the method shows a see-saw 
profile corresponding to even-odd numbers of Krylov vec- 
tors, as was previously noticed for the Hermitian case as 
well [5] . This even-odd pattern is related to the sign func- 
tion being odd in its argument. The see-saw effect is com- 
pletely removed when using the Harmonic Ritz values as 
described in Ref. [5] for Hermitian matrices and extended 
to non-Hermitian matrices in Ref. [28] , and convergence be- 
comes smooth. Alternatively, one can just as well restrict 
the calculations to even-sized Krylov subspaces. 

Unfortunately, in the case of the sign function the Arnoldi 
method described above has a very poor efficiency when 
some of the eigenvalues are small. In the case considered 
in Fig. 3 (see Sec. 6 below), the size of the Krylov space 
has to be taken very large (fc « N/2) to reach accuracies 
of the order of 10 -8 (see the m = curve in the top pane 
of Fig. 3). A discussion and resolution of this problem are 
given in the next section. 

5. Deflation 

5.1. Introduction 

For Hermitian matrices, it is well known that the compu- 
tation of the sign function can be improved by deflating the 
eigenvalues smallest in absolute value [5] . The reason why 
this is crucial and specific to the sign function is the dis- 
continuity of the sign function at zero. For non-Hermitian 
matrices, the situation is analogous since the sign function 
now has a discontinuity along the imaginary axis. A nec- 
essary condition for the method described in Sec. 4 to be 
efficient is the ability to approximate / well at the eigenval- 
ues of A by a low-order polynomial. If the gap between the 
eigenvalues of A to the left and to the right of the imagi- 
nary axis is small, no low-order polynomial will exist which 
will be accurate enough for all eigenvalues. 

The idea is to resolve this problem by treating these crit- 
ical eigenvalues exactly, and performing the Krylov sub- 
space approximation on a deflated space. 

In the Hermitian case, deflation is straightforward. The 
function f(A) of a Hermitian matrix A with eigenvalues 
Aj and orthonormal eigenvectors itj (i = 1, . . . , N) can be 
written as 

N 

f(A)=Y,f^ u * u l ( 2 °) 

i=i 

and its operation on an arbitrary vector x as 

N 

f(A)x = J2f(^M^i- (21) 

i=l 



4 



If m critical eigenvalues of A and their corresponding eigen- 
vectors have been computed, one can split the vector space 
into two orthogonal subspaces and write an arbitrary vec- 
tor as x = x\\ + x±, where xn = Y^iLi( u l x ) u i an d x -J- = 
x — x\\. Eq. (21) can then be rewritten as 



f(A)x = Y / f(\ i )(ulx)u i + f(A)xj 



(22) 



The first term on the right-hand side of Eq. (22) can be 
computed exactly, and the second term can be approxi- 
mated by applying the Arnoldi method of Sec. 4 to x±. 
As the vector x± does not contain any contribution in the 
eigenvector directions corresponding to the critical eigen- 
values, the polynomial approximation no longer needs to 
interpolate / at the eigenvalues closest to the function dis- 
continuity to approximate f(A)x± well. Therefore, after 
deflation, lower-degree polynomials will yield the required 
accuracy, and a smaller-sized Krylov subspace can be used 
in the approximation. In theory, the orthonormality of the 
eigenvectors guarantees that the Krylov subspace will be 
perpendicular to x» , but in practice numerical inaccuracies 
could require us to reorthogonalizc the subspaces during 
the construction of the Krylov subspace. 

For non-Hermitian matrices the (generalized) eigenvec- 
tors are no longer orthonormal, and it is not immediately 
clear how to deflate a critical subspace. The matrix func- 
tions as defined in (4) or (7) involve the inverse of the ma- 
trix of basis vectors, U , and no simple decomposition into 
orthogonal subspaces can be performed. 

In the remainder of this section, we will develop two alter- 
native deflation schemes for the non-Hermitian case, using 
a composite subspace generated by adding a small number 
of critical eigenvectors to the Krylov subspace. This idea of 
an augmented Krylov subspace method has been used in the 
iterative solution of linear systems for some time, see, e.g., 
Rcf. [30]. Since in computational practice one will never 
encounter non-trivial Jordan blocks, we assume in the fol- 
lowing, for simplicity, that the matrix is diagonalizable. 



5.2. Schur deflation 

We construct the subspace fi m + ICk(A, x), which is the 
sum of the subspace f2 m spanned by the eigenvectors cor- 
responding to m critical eigenvalues of A and the Krylov 
subspace ICk(A, x). The aim is to make an approximation 
similar to that of Eq. (19), but to treat the contribution 
of the critical eigenvalues to the sign function explicitly so 
that the size of the Krylov subspace can be kept small. 

Assume that m critical eigenvalues and right eigenvectors 
of A are determined using an iterative eigenvalue solver like 
the one implemented in ARPACK. From this one can easily 
construct m Schur vectors and the corresponding m x m 
upper triangular matrix T m satisfying 



where S m = (si, . . . , s m ) is the N x m matrix formed by 
the orthonormal Schur vectors and the diagonal elements 
of T m are the eigenvalues corresponding to the Schur vec- 
tors. These Schur vectors form an orthonormal basis of the 
eigenvector subspace f2 m , which is invariant under opera- 
tion of A. 

After constructing the m-dimensional subspace O m we 
run a modified Arnoldi method to construct an orthogonal 
basis of the composite subspace fi m + K,k(A,x). That is, 
each Arnoldi vector is orthogonalized not only against the 
previous ones, but also against the Schur vectors Sj. In 
analogy to (14), this process can be summarized as 



MS m V h 



Sn 




PkVk+ie m+k 



(24) 

Here, the columns V\, . . . ,Vk of Vk form an orthonormal 
basis of the space K,^(A, x), which is the projection of the 
Krylov subspace K-u {A, x) onto the orthogonal complement 
Q, of fl m . In particular, v\ — x±/ft, where x± — (1 — 
SmSjfJx is the projection of x onto Q 1 - and (3 = \x±\. 
Again, Hk is an upper Hcssenbcrg matrix. 

Note that the orthogonality of JC^ with respect to fl m 
has to be enforced explicitly during the Arnoldi iterations, 
as the operation of A on a vector in the projected Krylov 
subspace K,^ in general gets a contribution belonging to 
O m , i.e., K,j: is not invariant under the operation of A. This 
is a consequence of the non-orthogonality of the eigenvec- 
tors of A. 

Defining 



Q=(S m Vk) and H 



SlAV k 
H k 



(25) 



H satisfies a relation similar to Eq. (17), namely 

H = Q^AQ , (26) 

and the function approximation derived in Sec. 4 can be 
used here as well. We briefly repeat the steps of Sec. 4. 
The operation of the matrix function f(A) on the vector 
x can be approximated by its projection on the composite 
subspace, 

f(A)x « QQ^f(A)x , (27) 
and because x lies in the subspace, 



f(A)x « QQif(A)QQi 



(28) 



As H satisfies Eq. (26), we can introduce the same approx- 
imation as in Eq. (18), 

f(H) a Q f f(A)Q , (29) 

and substituting Eq. (29) in Eq. (28) we construct the func- 
tion approximation 



f(A)x « Qf{H)Q^x 



(30) 



ASn 



(23) 



Because of the block structure (25) of the composite Hes- 
senberg matrix H, the matrix f(H) can be written as 
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f(H) = 



f(T m ) Y 
f(H k ) l 



(31) 



The upper left corner is the function of the triangular Schur 
matrix T m (which is again triangular), and the lower right 
corner is the (dense) matrix function of the Arnoldi Hes- 
senberg matrix H k . The upper right corner reflects the cou- 
pling between both subspaces and is given by the solution 
of the Sylvester equation 

T m Y - YH k = f(T rn )X - Xf(H k ) (32) 

with X = S^AVk, which follows from the identity 
f(H)H = Hf(H). Combining (30) and (31), we obtain 



f(Tm) Y 



f{H k ) 




(33) 



Note that V^x 



I x = (3e\. Therefore only f(T m ) and the first 
column of Y and f(H k ), i.e., the first m + 1 columns of 
f(H), are needed to evaluate (33). This information can be 
computed using the spectral definition (4) or some other 
suitable method [15,17,22,23]. In the case of the sign func- 
tion we chose to use Roberts' iterative method [16] 



(34) 



with S° = A, which converges quadratically to sgn(A). 
Roberts' method is applied to compute sgn(T m ) and 
sgn(iifc). The matrix Y is computed by solving the 
Sylvester equation (32) using the method described in 
Appendix A. 

In the implementation one has to be careful to com- 
pute the deflated eigenvectors to high enough accuracy, as 
this will limit the overall accuracy of the function approx- 
imation. When computing f(A)x for several x, the partial 
Schur decomposition (23) and the triangular matrix f(T m ) 
need to be computed only once. Only the modified Arnoldi 
method must be repeated for each new vector x. 

We summarize our algorithm for approximating f(A)x: 

(i) Determine the eigenvectors for m critical eigenvalues 
of A using ARPACK. Construct and store the corre- 
sponding Schur matrix S m and the upper triangular 
matrix T m = S^ASm. The columns of S m form an 
orthonormal basis of a subspace f2 m . 

(ii) Compute the triangular matrix f(T m ) using (34). 

(iii) Compute x± = (1 — S m SlAx. 

(iv) Construct an orthonormal basis for the projected 
Krylov subspace K~t(A, x±) using a modified Arnoldi 
method. The basis is constructed iteratively by or- 
thogonalizing each new Krylov vector with respect to 
fl m and to all previous Arnoldi vectors, and is stored 
as columns of a matrix V k . Also build the upper 
Hesscnbcrg matrix H — Q' AQ with Q = (S m , V k ). 



(v) Compute (column m + 1 of) f(H) using Roberts' it- 
erative method (34) on H k and solving the Sylvester 
equation (32) for Y as described in Appendix A. 

(vi) Compute the approximation f(A)x ~ Q j '(H)Q' x us- 
ing formula (33). 

If f(A)x has to be computed for several x, only steps (iii)— 
(vi) need to be repeated for each vector x. 

5.3. LR-deflation 

An alternative deflation in the same composite subspace 
fl m + K-k (A, x) can be constructed using both the left and 
right eigenvectors corresponding to the critical eigenval- 
ues. This deflation algorithm is a natural extension of the 
method described in Sec. 5.1 from the Hermitian to the 
non-Hermit ian case. A similar idea has been used for the 
iterative solution of linear systems [31,32]. 

Assume that m critical eigenvalues of A have been com- 
puted together with their corresponding left and right 
eigenvectors by some iterative method like the one pro- 
vided by ARPACK. The right eigenvectors satisfy 



(35) 



with A m the diagonal eigenvalue matrix for the m criti- 



cal eigenvalues and R m = (ri , 



t ) the matrix of right 



eigenvectors (stored as columns). Similarly, the left eigen- 
vectors obey 

LlA = A m L} n , (36) 

where L m = (t\, . . . , £ m ) is the matrix containing the left 
eigenvectors (also stored as columns) . For a non- Hermitian 
matrix, the left and right eigenvectors corresponding to dif- 
ferent eigenvalues are orthogonal (for degenerate eigenval- 
ues linear combinations of the eigenvectors can be formed 
such that this orthogonality property remains valid in gen- 
eral). Furthermore, if the eigenvectors are normalized such 
that l\ri = 1, then clearly L^Rm — I m , and R m L\ n is an 
oblique projector on the subspace Q, m spanned by the right 
eigenvectors. 

Let us now decompose x as 



X = X\\ + Xq , 



(37) 



where x« = RmL^x is an oblique projection of x on f2 m 
and Xq = x — x» . 

Applying f(A) to the decomposition (37) yields 

f(A)x = f(A)R m Llx + f(A)x e . (38) 

The first term on the right-hand side can be evaluated ex- 
actly using 



f{A)R m L] n x = R m f(A m )Ll 



(39) 



which follows from Eq. (35), while the second term can be 
approximated by applying the Arnoldi method described in 
Sec. 4 to the vector xq . An orthonormal basis is constructed 
in the Krylov subspace K, k (A, xq) using the recurrence 

AV k =V k H k +j3 k v k+1 eZ , (40) 
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where v\ = xq/ (3 and f3 = \xq\. By construction, xq has 
no components along the m critical (right) eigendirections, 
and successive operations of A will yield no contributions 
along these directions either, hence K.k(A, xq) does not mix 
with f2 m . In principle, numerical inaccuracies accumulated 
during the Arnoldi iterations might make it necessary to 
occasionally re-extract spurious components along the crit- 
ical eigendirections. However, this turned out not to be nec- 
essary in our numerical calculations. 

Applying the Arnoldi approximation (19) to Eq. (38) 
yields the final approximation 

f(A)x « R m f(A m )Llx + pV k f(H k ) ei . (41) 

Note that again only the first column of f(H k ) is needed 
to evaluate Eq. (41). As before, /(-fffc) has to be computed 
using some suitable method. The function sgn(fffc) can be 
efficiently computed using Roberts' algorithm (34). 

We summarize our algorithm for approximating f(A)x 
in the LR-deflation scheme: 

(i) Determine the left and right eigenvectors for m criti- 
cal eigenvalues of A using ARPACK. Store the corre- 
sponding eigenvector matrices L m and R m . 

(ii) Compute f(Xi) (i = 1, . . . , m) for the critical eigen- 
values. 

(iii) Compute Xq = (1 — RmLJ^x. 

(iv) Construct an orthonormal basis for the Krylov sub- 
space lCk(A 7 Xq) using the Arnoldi recurrence. The ba- 
sis is constructed itcratively by orthogonalizing each 
new Krylov vector with respect to all previous Arnoldi 
vectors, and is stored as columns of a matrix Also 
build the upper Hessenberg matrix Hk = V^AVk- 

(v) Compute (the first column of) f{Hk) using Roberts' 
iterative method (34). 

(vi) Compute the approximation to f(A)x using Eq. (41). 
If f(A)x has to be computed for several x, only steps (iii)— 
(vi) need to be repeated for each vector x. 

5.4. Discussion 

We briefly compare both deflation schemes. Al- 
though both schemes use the same composite subspace 
Q m + JCk(A,x), they yield different function approxima- 
tions resulting from a different subspace decomposition. 

In the Schur deflation, the composite subspace is decom- 
posed in a sum of two orthogonal subspaces which are cou- 
pled by A, while in the LR-deflation the subspaces no longer 
mix, at the expense of losing orthogonality of the two sub- 
spaces. 

Accordingly, both schemes introduce different approxi- 
mations to f(A)x: the Schur deflation approximates the 
orthogonal projection of the solution vector on the total 
composite space using Eq. (29), while the LR-deflation 
first extracts the critical eigendirections and only approx- 
imates the orthogonal projection of the residual vector on 
the Krylov subspace using Eq. (19). Therefore, in the Schur 
deflation the components of f(A)x along the Schur vec- 
tors become more accurate as the Krylov subspace becomes 



larger, while in the LR-deflation the components of f(A)x 
along the critical eigendirections can be computed exactly, 
independently of the size of the Krylov subspace. This is 
probably the reason for the observation that, for fixed m 
and fc, the LR-deflation is slightly more accurate than the 
Schur deflation, see Fig. 4 below. 

Numerically, the LR-deflation has two advantages over 
the Schur deflation. First, its Arnoldi method does not re- 
quire the deflated directions to be (obliquely) projected out 
of the Krylov subspace because the subspaces do not mix. 
Second, this absence of coupling between the subspaces 
means that the LR-scheme has no analog of the Sylvester 
equation (32). 

A downside of the LR-deflation is that both left and 
right eigenvectors need to be computed in the initializa- 
tion phase, whereas the Schur deflation only needs the right 
eigenvectors. Hence, the Schur deflation will have a shorter 
initialization phase, unless one needs to operate with both 
f(A) and its adjoint f{A)\ in which case both sets of eigen- 
vectors are needed anyways (for the latter, the roles of left 
and right eigenvectors are interchanged). 

In the next section, we will present numerical results 
obtained with our modified Arnoldi method. 

6. Results 

We implemented the modified Arnoldi method proposed 
in the previous section to compute the sign function occur- 
ring in the overlap Dirac operator (1) of lattice QCD. 

First we discuss the critical eigenvalues of the 75-Wilson- 
Dirac operator H w (fi), which are needed for deflation and 
have to be computed once for any given SU (3) gauge con- 
figuration. Deflation is needed because of the existence of 
eigenvalues close to the imaginary axis. In Fig. 1 we show 
the spectrum of H w (/i) for a 4 4 lattice and a 6 4 lattice, 
using the same parameters as in Rcf. [7], i.e., m w = —2 
and gauge coupling (3 g = 5.1. These complete spectra were 
computed with L APACK [33] . This is a very costly calcu- 
lation, especially for the 6 4 lattice, which was done for the 
sole purpose of this numerical investigation but cannot be 
performed routinely during production runs. 

Although the eigenvalues of interest for deflation in the 
case of the sign function are those with smallest absolute 
real parts, we decided to deflate the eigenvalues with small- 
est magnitude instead. Numerically the latter are more eas- 
ily determined, and both choices yield almost identical de- 
flations for the 75-Wilson operator at nonzero chemical po- 
tential. The reason for this is that, as long as the chemical 
potential does not grow too large, the spectrum looks like 
a very narrow bow-tie shaped strip along the real axis (see 
Fig. 1), and the sets of eigenvalues with smallest absolute 
real parts and smallest magnitudes will coincide. 

In practice we compute the eigenvalues of H w with small- 
est magnitude with ARPACK. This package has an option 
to retrieve the eigenvalues with smallest magnitude without 
performing an explicit inversion, which would be very ex- 
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Fig. 1. Spectrum of H w (fj.) in Eq. (1) for a 4 4 lattice (top pane) 
and a 6 4 lattice (bottom pane), with /i = 0.3 and rrt w = —2. Note 
the difference in scale between real and imaginary axes. The gauge 
fields were generated using the Wilson plaquette action with gauge 
coupling /3 g = 5.1 [7]. 

pensive in this case. However, the use of this option requires 
the eigenvalues to be near the boundary of the convex hull 
of the spectrum. From Fig. 1 it is clear that the eigenval- 
ues closest to the origin are deep inside the interior of the 
convex hull and do not satisfy this requirement. Therefore 
we opted to compute the eigenvalues with smallest magni- 
tude of the squared operator H 2 . Clearly the eigenvalues 
with smallest magnitude will be the same for both oper- 
ators, as | A 2 1 = |A| 2 . The eigenvalues of arc given by 
z 2 = x 2 — y 2 + 2ixy, where x and y are the real and imag- 
inary parts of the eigenvalues z of H w . The spectra of 
for the 4 4 and 6 4 lattices are shown in Fig. 2, and clearly 
the eigenvalues with smallest magnitudes are now close to 
the boundary of the convex hull of the spectrum so that 
ARPACK can find them more easily. Since in this approach 
we square the matrix, there is the potential danger that 
small eigenvalues get spoiled just by the additional numer- 
ical round-off introduced when applying A twice. However, 
this should be noticeable only if these eigenvalues are com- 
parable in size to the round-off error. Our calculations are 
not affected by this problem. 

Obviously there is a trade-off between the number of 
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Fig. 2. Spectrum of H^(ij,) for a 4 4 lattice (top pane) and a 6 4 lattice 
(bottom pane), with fj, = 0.3 and /3 g = 5.1. 

deflated eigenvalues and the size of the Krylov subspace. 
A useful piece of information in this context is the ratio 
of the magnitude of the largest deflated eigenvalue over 
the largest overall eigenvalue, which is given in Table 1 for 
different numbers of deflated eigenvalues. A comparison of 
the values for both lattice sizes indicates that the number 
of eigenvalues with a magnitude smaller than a given ratio 
increases proportionally with the lattice volume. This is 
consistent with a spectral density of the small eigenvalues 
proportional to the lattice volume. This property is also 
apparent in Figs. 1 and 2, as the contours enclosing the 
spectra remain unchanged when the volume is increased. 
As a first guess we therefore expect that scaling m with the 
volume should yield comparable convergence properties of 
the method for various lattice sizes. 

The convergence of the method is illustrated in Fig. 3, 
where the accuracy of the approximation is shown as a func- 
tion of the Krylov subspace size for two different lattice 
sizes. The various curves correspond to different numbers 
of deflated eigenvalues. The results in the figure were com- 
puted using the LR-dcflation scheme. The Schur deflation 
yields similar results. 

Without deflation (m = 0) the Krylov subspace method 
would be numerically unusable because of the need of large 
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Fig. 3. Accuracy of the modified Arnoldi method for y = sgn(A)x. 
The non-Hcrmitian matrix is A = 75D w (£t), where Dv,(fi) is the 
Wilson-Dirac operator (2) at chemical potential fi = 0.3, and 
x = (1,1,..., 1). Top pane: 4 4 lattice with dim(A) = 3072, bot- 
tom pane: 6 4 lattice with dim(A) = 15552. The relative error 
e = Wii ~ y\\/\\y\\ ls shown as a function of the Krylov subspace size 
k for various numbers of deflated eigenvalues m using the LR-defla- 
tion. In order to compute the error e, the exact solution y was first 
determined using the spectral definition (4) and the full spectral 
decomposition computed with LAPACK. 

Krylov subspaces. Clearly, deflation highly improves the 
efficiency of the numerical method: as more eigenvalues are 
deflated, smaller Krylov subspaces are sufficient to achieve 



m 


max |A dcfl |/max |A all | 




4 4 lattice 


6 4 lattice 


2 


0.0040 


0.0016 


4 


0.0056 


0.0026 


8 


0.0119 


0.0035 


16 


0.0183 


0.0052 


32 


0.0351 


0.0084 


64 


0.0631 


0.0155 


128 




0.0286 



Table 1 

Ratio of largest deflated eigenvalue over largest eigenvalue for various 
numbers of deflated eigenvalues m for a 4 4 and a 6 4 lattice. 
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Fig. 4. Comparison of the accuracies achieved with both defla- 
tion schemes and the exact projection of y on the composite space 
flm + ICk(A,x). The relative error e = \\y — J/||/||j/|| is shown as a 
function of the Krylov subspace size k. For both lattice sizes the ac- 
curacy of the LR-dcflation is slightly better than that of the Schur 
deflation. Furthermore, the accuracy of the modified Arnoldi ap- 
proximation is very close to the best possible approximation in the 
composite subspace. 

a given accuracy. 

Furthermore, the deflation efficiency grows with increas- 
ing lattice volume. To reach an accuracy of 10 -8 for the 
4 4 lattice with 25 (~ 0.0081./V) deflated eigenvalues, one 
requires a Krylov subspace size k « 570 (« 0.19iV). How- 
ever, to reach the same accuracy for the 6 4 lattice with a 
comparable deflation of 128 (« 0.0082iV) critical eigenval- 
ues, one only requires k « 700 (« 0.0457V). Although the 
matrix size N is more than 5 times larger for the 6 4 lattice, 
the Krylov subspace only has to be expanded by a factor 
of 1.2 to achieve the same accuracy (when m is scaled pro- 
portional to N so that the ratio of Table 1 remains approx- 
imately constant for both lattice sizes). 

In Fig. 4 we compare the accuracy of the two deflation 
schemes described in Sec. 5.2 and 5.3. For an equal number 
of deflated eigenvalues and equal Krylov subspace size, the 
LR-dcflation seems systematically slightly more accurate 
than the Schur deflation. 

To assess the quality of the modified Arnoldi approxima- 
tion, it is interesting to compare the approximations (33) 
and (41) for f(A)x with the best approximation in the com- 
posite subspace, which corresponds to the orthogonal pro- 
jection (27) of f{A)x on fi TO + K.k(A, x), 

m k 

2/proj = QQh = 5>ty)* + X)(^»)«< ■ (42) 

i=l i=l 

The relative accuracy of this projection is also shown in 
Fig. 4. It is encouraging to note that the modified Arnoldi 
approximation is quite close to the exact projection 2/ pro j- 
In Table 2 we show the CPU time used by the modified 
Arnoldi method for the Schur and LR-dcflation schemes. 
The times needed to construct the orthonormal basis in the 
Krylov subspaces according to Eqs. (24) and (40) and to 
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compute the sign function of the Arnoldi Hesscnberg ma- 
trices are tabulated separately. The tabulated times were 
measured for an m = 32 deflation for the 4 4 lattice and 
m = 128 for the 6 4 lattice. 

The larger CPU times required by the Schur defla- 
tion mainly reflect the additional orthogonalization of the 
Arnoldi vectors with respect to the Schur vectors. The time 
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14.1 s 


k 


Arnoldi 


sgn(-f/) 


total 
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0.18 


0.03 


0.23 
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0.59 


0.21 
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1.22 


0.52 


1.75 
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2.05 


1.08 


3.16 
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3.12 


1.79 


4.93 


600 


4.37 
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800 


7.56 


6.69 


14.28 


900 
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18.92 
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12.68 
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4 4 lattice, m = 32 




LR-deflation 




initialization time: 27.5 s 


k 


Arnoldi 


sgn(H k ) 


total 


100 


0.12 


0.03 


0.15 


200 


0.45 


0.20 


0.66 


300 


1.01 


0.49 


1.51 


400 


1.77 


1.02 


2.82 


500 


2.76 


1.69 


4.47 


600 


3.94 


2.77 


6.74 


700 


5.36 


4.40 


9.79 


800 


6.96 


6.44 


13.44 


900 


8.84 


9.10 


17.98 


1000 


10.84 


12.33 


23.21 
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4 lattice, m = 128 
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Schur deflation 






LR-dcflation 




initialization time: 


884 s 




initialization time: 1713 s 


k 


Arnoldi 


sgn(H) 


total 




k 


Arnoldi 


sgn(-Hfc) 


total 


100 


2.03 


0.05 


2.13 




100 


0.66 


0.03 


0.75 


200 


5.16 


0.22 


5.45 




200 


2.39 


0.15 


2.62 


300 


9.27 


0.56 


9.91 




300 


5.16 


0.42 


5.69 


400 


14.59 


1.15 


15.85 




400 


9.01 


0.94 


10.06 


500 


20.95 


2.09 


23.17 




500 


13.96 


1.73 


15.84 


600 


28.12 


3.35 


31.61 




600 


20.03 


2.80 


22.98 


700 


36.81 


5.17 


42.15 




700 


27.09 


4.44 


31.70 


800 


46.32 


7.39 


53.88 




800 


35.09 


6.49 


41.78 


900 


56.83 


10.37 


67.39 




900 


44.38 


9.10 


53.70 


1000 


68.29 


13.88 


82.39 




1000 


54.74 


12.36 


67.34 



Table 2 

CPU time (in seconds) for varying Krylov subspace size k. Top row: 
4 4 lattice with m = 32, bottom row: 6 4 lattice with m = 128. Left 
panes: Schur deflation, right panes: LR-deflation. The time required 
by the initial calculation of the critical eigenvectors is given in the 
header of each block. The time needed to construct the Arnoldi basis 
in the Krylov subspace (column 2) is approximately proportional to 
Nk(k+2m) for the Schur deflation and Nk 2 for the LR-deflation. The 
time used by Roberts' method (34) to compute the sign function of 
the Hessenberg matrix (column 3) is 0(fc 3 ). The total time (column 
4) also includes the evaluation of Eq. (33) for the Schur deflation and 
Eq. (41) for the LR deflation. These timings were measured on an 
Intel Core 2 Duo 2.33GHz computer using optimized ATLAS BLAS 
routines [34], 



needed to compute the sign of the Hesscnberg matrix is 
also slightly larger for the Schur deflation as it involves the 
additional solution of the Sylvester Equation (32). For the 
same reasons, varying m for a given lattice size will only 
significantly change the timings for the Schur deflation 
(this m-dependence is not shown in the table). 

To summarize, the LR-deflation scheme has a somewhat 
better accuracy and requires less CPU time per iteration 
than the Schur deflation. The one advantage of the Schur 
deflation is that it only requires the initial computation 
of the right eigenvectors, while the LR-dcflation requires 
the computation of both left and right eigenvectors. The 
time needed to compute the critical eigenvectors of H w (n) 
is given in the headers of the four blocks in Table 2. The 
choice of deflation scheme depends on the number of vectors 
x for which sgn(H w )x needs to be computed. This will be 
the topic of future work on nested iterative methods for 
non-Hermitian matrices occurring during the inversion of 
the overlap operator. Of course, as mentioned in Sec. 5.4, if 
one needs to apply both sgn(i? w ) and its adjoint, then the 
LR-deflation will be the better choice. 



7. Conclusion 

In this paper we have proposed an algorithm to approx- 
imate the action of a function of a non-Hermitian matrix 
on an arbitrary vector, when some of the eigenvalues of the 
matrix lie in a region of the complex plane close to a dis- 
continuity of the function. 

The method approximates the solution vector in a com- 
posite subspace, i.e., a Krylov subspace augmented by the 
eigenvectors corresponding to a small number of critical 
eigenvalues. In this composite subspace two deflation vari- 
ants are presented based on different subspace decompo- 
sitions: the Schur deflation uses two coupled orthogonal 
subspaces, while the LR-dcflation uses two decoupled but 
non-orthogonal subspaces. 

The subspace decompositions are then used to compute 
Arnoldi-bascd function approximations in which the con- 
tribution of the critical eigenvalues is taken into account 
explicitly This deflation of critical eigenvalues allows for a 
smaller size of the Krylov subspace and is crucial for the 
efficiency of the method. 

For the sign function, deflation is particularly important 
because of its discontinuity along the imaginary axis. The 
method was applied to the overlap Dirac operator of lat- 
tice QCD at nonzero chemical potential, where deflation 
was shown to clearly enhance the efficiency of the method. 
If the overlap Dirac operator has to be inverted using some 
iterative method, each iteration will require the computa- 
tion of sgn(H w )x for some vector x. In such a situation the 
cost for computing the critical eigenvectors, which is done 
just once, is by far outbalanced by the smaller costs for 
each evaluation of the sign function. However, an impor- 
tant question that deserves further study is how the optimal 
number m of deflated eigenvectors depends on the volume 
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and how this influences the initialization time. This ques- 
tion could become performance relevant for large volumes. 

As mentioned above, our next steps include the appli- 
cation of the two-sided Lanczos method to the problem of 
approximating f(A)x for a non-Hcrmitian matrix, and the 
investigation of nested iterative methods for non-Hcrmitian 
matrices. Work in these directions is in progress. 

Appendix A. Sylvester equation 

In this appendix we describe a particularly simple algo- 
rithm to solve the special Sylvester equation 



TY -YH = C , 



(A.l) 



where T is an m x m upper triangular matrix, H is an n x n 
upper Hesscnberg matrix, and the right-hand side C and 
the unknown matrix Y are mxn matrices. Classical meth- 
ods to solve the Sylvester equation when T and H are full 
matrices are formulated in [35,36]. For triangular H and T 
the Sylvester equation can easily be solved by direct substi- 
tution, see, e.g., [14]. In principle, this algorithm could also 
be applied to Eq. (A.l) if the upper Hessenberg matrix H is 
first transformed into triangular form using a Schur decom- 
position. Here we present a more efficient approach, which 
can be regarded as a natural extension of the algorithm for 
the triangular Sylvester equation when one of the matrices 
is upper Hesscnberg instead of triangular. Blocking would 
also be possible (cf., e.g., [37]), but since the solution of the 
Sylvester equation accounts only for a small portion of the 
overall time we did not pursue this issue further. 

Written out explicitly, the element of the matrix 

equation (A.l) is 

m max(j'+l,n) 

^ T ik y k j - //,; //,., <•,, (A. 2) 

k—i k—1 

for i = 1 , . . . , m and j = 1 , . . . , n. 

This matrix equation can be solved row by row from 
bottom to top, since Eq. (A. 2) can be solved for row i once 
rows i + 1, . . . , m are known, 



max(j+l,n) 
k=l 

Em rp 
k=i+l 1 ikVkj- 



(A.3) 



with 

Inside row i one can solve for the element ytj+i as a 
function of the elements to its left, 



1 



Vi,3 + l 



<kj - T li y l j + ^2 VikHkj 



fe=l 



(A.4) 



for columns j = 1, . . . , n — 1. From Eq. (A.4) it follows that 
all elements of row i can be written as 



(A.5) 



where the coefficients a,j and bj can be computed explicitly 
from the recurrence relations 



Oj+l 



H 



3+1. j 



-Tucij + ^ a k H, 



k=l 



b J+ i = - 



H 



j+i,j 



Tubj 



J2b k H, 



k=l 



(A.6) 



starting from a\ = 1, b\ = 0. After substituting Eq. (A.5) 
with the known coefficients (A.6), element (i,n) of 
Eq. (A.3) can be solved for yn, 



ya 



n — Tub n + J2k=l bkHkn 
Tii&n — -\ &kHkn 



(A.7) 



Once yn is known, all other elements y^ of row i can be 
computed using Eq. (A.5) with coefficients (A.6). 
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